Almost Optimal Variance-Constrained Best Arm Identification
نویسندگان
چکیده
We design and analyze Variance-Aware-Lower Upper Confidence Bound (VA-LUCB), a parameter-free algorithm, for identifying the best arm under fixed-confidence setup stringent constraint that variance of chosen is strictly smaller than given threshold. An upper bound on VA-LUCB’s sample complexity shown to be characterized by fundamental variance-aware hardness quantity $H_{\mathrm {VA}}$ . By proving an information-theoretic lower bound, we show VA-LUCB optimal up factor logarithmic in Extensive experiments corroborate dependence various terms comparing empirical performance close competitor RiskAverse-UCB-BAI David et al. (2018) our suggest has lowest this class risk-constrained identification problems, especially riskiest instances.
منابع مشابه
Optimal Best Arm Identification with Fixed Confidence
We give a complete characterization of the complexity of best-arm identification in one-parameter bandit problems. We prove a new, tight lower bound on the sample complexity. We propose the ‘Track-and-Stop’ strategy, which we prove to be asymptotically optimal. It consists in a new sampling rule (which tracks the optimal proportions of arm draws highlighted by the lower bound) and in a stopping...
متن کاملOn the Optimal Sample Complexity for Best Arm Identification
We study the best arm identification (Best-1-Arm) problem, which is defined as follows. We are given n stochastic bandit arms. The ith arm has a reward distribution Di with an unknown mean μi. Upon each play of the ith arm, we can get a reward, sampled i.i.d. from Di. We would like to identify the arm with largest mean with probability at least 1− δ, using as few samples as possible. We also st...
متن کاملTowards Instance Optimal Bounds for Best Arm Identification
In the classical best arm identification (Best-1-Arm) problem, we are given n stochastic bandit arms, each associated with a reward distribution with an unknown mean. Upon each play of an arm, we can get a reward sampled i.i.d. from its reward distribution. We would like to identify the arm with the largest mean with probability at least 1 − δ, using as few samples as possible. The problem has ...
متن کاملMulti-Bandit Best Arm Identification
We study the problem of identifying the best arm in each of the bandits in a multibandit multi-armed setting. We first propose an algorithm called Gap-based Exploration (GapE) that focuses on the arms whose mean is close to the mean of the best arm in the same bandit (i.e., small gap). We then introduce an algorithm, called GapE-V, which takes into account the variance of the arms in addition t...
متن کاملOpen Problem: Best Arm Identification: Almost Instance-Wise Optimality and the Gap Entropy Conjecture
The best arm identification problem (BEST-1-ARM) is the most basic pure exploration problem in stochastic multi-armed bandits. The problem has a long history and attracted significant attention for the last decade. However, we do not yet have a complete understanding of the optimal sample complexity of the problem: The state-of-the-art algorithms achieve a sample complexity of O( ∑n i=2 ∆ −2 i ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Information Theory
سال: 2023
ISSN: ['0018-9448', '1557-9654']
DOI: https://doi.org/10.1109/tit.2022.3222231